Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC

Introduction

This course will focus on the use of models for understanding, predicting, and controlling infectious disease systems. Models play a central role in this work because they allow us to precisely and quantitatively express our ideas about the mechanisms of infectious disease transmission, immunity, and ecology. To the extent our understanding of the important mechanisms is correct, such models are extremely useful in the design of policy. On the other hand, models are useful in uncovering the important mechanisms, inasmuch as we can compare model predictions to data. Specifically, if we can translate competing hypotheses into mathematical models, these can be compared in terms of their ability to correctly capture the patterns seen in data.

In order to fairly compare competing models, we must first try to find out what is the best each model can do. Practically speaking, this means that we have to find the values of the models’ parameters that give the closest correspondence between model predictions and data. Parameter estimation can be important even when we are fairly confident in the ability of a single model to explain the dynamics. Not surprisingly, the all-important quantity \(R_0\) is frequently the focus of considerable parameter-estimation effort. Here, we’ll try our hand as estimating \(R_0\) and other model parameters from an epidemic curve using a couple of different methods.

Estimating \(R_0\) from features of the data

Estimating \(R_0\) from the final size

For a simple, closed SIR outbreak, we can derive an expression that determines the final size of the outbreak, i.e., the total number of hosts infected. To do this, note that if \(\frac{dX}{dt}=-\frac{\beta S I}{N}\) and \(\frac{dY}{dt}=\frac{\beta S I}{N}-\gamma\,I\), then \(\frac{dY}{dX}=-1+\frac{N}{R_0\,X},\) which we can integrate to yield \[X(0)-X(\infty)+\frac{N}{R_0}\,\log{\frac{X(\infty)}{X(0)}}=Y(\infty)-Y(0)=0.\] If \(X(0)=N\), then \(N-X(\infty)\) is the final size of the outbreak and the fraction ultimately infected is \(f=1-\frac{X(\infty)}{N}\). In terms of the latter, we have \[R_0=-\frac{\log{(1-f)}}{f}.\]

The following shows the relationship between final size and \(R_0\):

f <- seq(0,1,length=100)
R0 <- -log(1-f)/f
plot(f~R0,type='l',xlab=expression(R[0]),ylab="fraction infected",bty='l')

Estimating \(R_0\) in an invasion

During the early stages of an SIR outbreak, the number of infected individuals \(Y\) is approximately \[Y\;\approx\;Y_0\,e^{((R_0-1)\,(\gamma+\mu)\,t)}\] where \(Y_0\) is the (small) number of infectives at time \(0\), \(\frac{1}{\gamma}\) is the infectious period, and \(\frac{1}{\mu}\) is the host lifespan. Taking logs of both sides, we get \[\log{Y}\;\approx\;\log{Y_0}+(R_0-1)\,(\gamma+\mu)\,t,\] which implies that a semi-log plot of \(Y\) vs \(t\) should be approximately linear with a slope proportional to \(R_0-1\) and the recovery rate.

We can plot the 1977 boarding school influenza data (Anonymous 1978) in this way to see if this is the case.

url <- "http://kingaa.github.io/short-course/stochsim/bsflu_data.txt"
bsflu <- read.table(url,header=TRUE)
plot(B~day,data=bsflu,type='b',bty='l',
     main='boarding school influenza outbreak',
     xlab='day',ylab='Influenza cases')

plot(B~day,data=bsflu,type='b',log='y',bty='l',
     xlab='day',ylab='Influenza cases')

Plotted on a log scale, the linearity of the first several data points is indeed striking. This suggests that we can obtain a cheap and cheerful estimate of \(R_0\) by a simple linear regression:

fit <- lm(log(B)~day,data=subset(bsflu,day<=4))
summary(fit)
## 
## Call:
## lm(formula = log(B) ~ day, data = subset(bsflu, day <= 4))
## 
## Residuals:
## 1978-01-22 1978-01-23 1978-01-24 1978-01-25 
##    -0.1844     0.1736     0.2061    -0.1953 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -1.2493     0.3295  -3.792  0.06305 . 
## day           1.4338     0.1203  11.917  0.00697 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.269 on 2 degrees of freedom
## Multiple R-squared:  0.9861, Adjusted R-squared:  0.9792 
## F-statistic:   142 on 1 and 2 DF,  p-value: 0.006968
coef(fit)
## (Intercept)         day 
##   -1.249350    1.433772
slope <- coef(fit)[2]; slope
##      day 
## 1.433772

Now, we know that influenza’s infectious period is about 2.5 da. Moreover, since this is far shorter than an average human life (\(\mu\ll\gamma\)), we can neglect \(\mu\) in our estimating equation. Thus our estimate of \(R_0\) is \[\hat{R_{0}}\;=\;\mathrm{slope}/\gamma+1\;\approx\;{1.4}\times{2.5}+1\;\approx\;{4.6}.\]

Our strategy in this case has been to redefine the problem so that it fits a standard form, i.e., we showed how to rearrange the model so that the relevant quantity (\(R_0\)) could be obtained from linear regression. This means that all the usual diagnostics associated with linear regression are also available to check the fit of the model. We can get a rough estimate of the uncertainty in our estimate by looking at the standard errors in our estimator.

coef(summary(fit))
##              Estimate Std. Error   t value    Pr(>|t|)
## (Intercept) -1.249350  0.3294996 -3.791659 0.063050107
## day          1.433772  0.1203162 11.916692 0.006968359
slope.se <- coef(summary(fit))[2,2]
2.5*slope.se
## [1] 0.3007906

So we reckon we’ve got an error of \(\pm{0.3}\) in our estimate of \(R_0\), i.e., we feel pretty confident that \({3.98}<R_{0}<{5.19}\).

A defect of this method is that it uses only a small amount of the data to compute an important quantity. Moreover, we have to make a subjective judgement as to how much of the data to use. Further, as we use more data, and presumably obtain more precise estimates, we simulataneously get further from the realm where our approximation is valid, which introduces greater bias. Let’s see how our estimates of \(R_0\) depend on what we choose to be the ``initial phase’’ of the outbreak. Below, we estimate \(R_0\) and its standard error using the first 2, 3, 4, …, 10 data points.

days <- 2:10
slope <- numeric(length=length(days))
slope.se <- numeric(length=length(days))
for (k in seq_along(days)) {
  fit <- lm(log(B)~day,data=subset(bsflu,day<=days[k]))
  slope[k] <- coef(summary(fit))[2,1]
  slope.se[k] <- coef(summary(fit))[2,2]
}
R0.hat <- slope*2.5+1
R0.se <- slope.se*2.5
plot(slope~days,type='o')

We’ll plot these estimates against their uncertainties to show this precision-accuracy tradeoff.

plot(range(days),
     range(c(R0.hat-2*R0.se,R0.hat+2*R0.se),na.rm=T),
     type='n',bty='l',
     xlab="length of initial phase (da)",
     ylab=expression("estimated"~R[0]))
lines(R0.hat~days,type='o',lwd=2)
lines(R0.hat+2*R0.se~days,type='l')
lines(R0.hat-2*R0.se~days,type='l')

Exercise

Biweekly data for outbreaks of measles in three communities within Niamey, Niger (Grais et al. 2006) are provided in the file http://kingaa.github.io/parest/niamey.csv. To download and plot the data, do, e.g.,

niamey <- read.csv("http://kingaa.github.io/parest/niamey.csv",comment.char="#")
plot(measles~biweek,data=niamey,subset=community=="A",type='b')
plot(measles~biweek,data=niamey,subset=community=="B",type='b')
plot(measles~biweek,data=niamey,subset=community=="C",type='b')

Use both the final-size and the invasion rate methods to obtain estimates of \(R_0\) for measles using the data from each of the communities of Niamey, assuming that the infectious period is approximately two weeks.

Solving ODE in R

Here we begin our study of computational techniques for studying epidemiological models. In this session we introduce the numerical solution (or integration) of nonlinear differential equations using the sophisticated solvers found in the package deSolve. Numerical integration is one of the most important tools we have for the analysis of epidemiological models.

The SIR model

As we saw in the lecture, the classical SIR compartmental model divides a population of hosts into three classes: susceptible, infected, recovered. The model describes how the fraction of a population in each of these classes changes with time. Alternatively, the model can track the number of individuals in each class. Births are modeled as flows from “nowhere” into the susceptible class; deaths are modeled as flows from the S, I, or R compartment into “nowhere”. If \(S\), \(I\), and \(R\) refer to the fractions of indivduals in each compartment, then these state variables change according to the following system of differential equations: \[\frac{dS}{dt} = B-\lambda\,S-\mu\,S\] \[\frac{dI}{dt} = \lambda\,S-\gamma\,I-\mu\,I\] \[\frac{dR}{dt} = \gamma\,I-\mu\,R\] Here, \(B\) is the crude birth rate (births per unit time), \(\mu\) is the death rate and \(\gamma\) is the recovery rate. We’ll assume that the force of infection, \(\lambda\), has the form \[\lambda = \beta\,I\] so that the risk of infection a susceptible faces is proportional to the prevalence (the fraction of the population that is infected). This is known as the assumption of frequency-dependent transmission. Notice that we allow for the possibility of a contact rate, \(\beta(t)\), that varies in time.

ODE integrators in R

Like almost all epidemiological models, one can’t solve these equations analytically. However, we can compute the trajectories of a continuous-time model such as this one by integrating the equations numerically. Doing this accurately involves a lot of calculation, and there are smart ways and not-so-smart ways of going about it. This very common problem has been very thoroughly studied by numerical analysts for generations so that, when the equations are smooth, well-behaved functions, excellent numerical integration algorithms are readily available to compute approximate solutions to high precision. In particular, R has several sophisticated ODE solvers which (for many problems) will give highly accurate solutions. These algorithms are flexible, automatically perform checks, and give informative errors and warnings. To use the numerical differential equation solver package, we load the deSolve package

require(deSolve)

The ODE solver we’ll use is called ode. Let’s have a look at the help page for this function.

?ode

ode needs to know the initial values of the state variables (y), the times at which we want solutions, the right-hand side of the ODE func. The latter can optionally depend on some parameters (parms).

SIR for a closed epidemic

Let’s study the SIR model for a closed population, i.e., one in which we can neglect births and deaths. Recall that the differential equations for the closed epidemic are \[\frac{dS}{dt} = -\beta\,S\,I\] \[\frac{dI}{dt} = \beta\,S\,I-\gamma\,I\] \[\frac{dR}{dt} = \gamma\,I\] To encode these equations in a form suitable for use as the func argument to ode, we’ll need to write a function. For example:

closed.sir.model <- function (t, x, params) {
  ## first extract the state variables
  S <- x[1]
  I <- x[2]
  R <- x[3]
  ## now extract the parameters
  beta <- params["beta"]
  gamma <- params["gamma"]
  ## now code the model equations
  dSdt <- -beta*S*I
  dIdt <- beta*S*I-gamma*I
  dRdt <- gamma*I
  ## combine results into a single vector
  dxdt <- c(dSdt,dIdt,dRdt) 
  ## return result as a list!
  list(dxdt)                              
}

Note that the order and type of the arguments and output of this function must exactly match ode’s expectations. Thus, for instance, the time variable t must be the first argument even if, as is the case here, nothing in the function depends on time. [When the RHS of the ODE are independent of time, we say the ODE are autonomous.] Note also, that ode expects the values of the ODE RHS to be the first element of a list.

Now we can call ode to compute trajectories of the model. To do this, we’ll need some values of the parameters. If we’re thinking of a disease something like measles, and measuring time in years, we might use something like:

params <- c(beta=400,gamma=365/13)

Exercise

What is the infectious period of this disease? What is \(R_0\) in this case?

We now state the times at which we want solutions and specify the initial conditions, i.e., the starting values of the state variables \(S\), \(I\), and \(R\):

times <- seq(from=0,to=60/365,by=1/365/4) # returns a sequence
xstart <- c(S=0.999,I=0.001,R=0.000)     # initial conditions

Next, we compute a model trajectory with the ode command and store the result in a data-frame:

require(deSolve)

out <- as.data.frame(
                     ode(
                         func=closed.sir.model,
                         y=xstart,
                         times=times,
                         parms=params
                         )
                     )

and plot the results using the commands:

plot(I~time,data=out,type='l')

Exercise

Suppose that you’d rather measure time in days. Modify the parameters accordingly and verify your modifications.

Let’s study how the epidemic curve depends on the transmission rate, \(\beta\), and the infectious period. In particular, we’ll investigate how the epidemic curve changes as we vary \(\beta\) from 20 to 500 and the infectious period from 5 to 30 days.

betavals <- c(20,50,500)
ips <- c(5,10,30)
gammavals <- 365/ips

## set some plot parameters
op <- par(mgp=c(2,1,0),mar=c(3,3,1,1),mfrow=c(3,3))

for (beta in betavals) {
  for (gamma in gammavals) {
    params <- c(beta=beta,gamma=gamma)
    out <- as.data.frame(
                         ode(
                             func=closed.sir.model,
                             y=xstart,
                             times=times,
                             parms=params
                             )
                         )    
    title <- bquote(list(beta==.(beta),"IP"==.(365/gamma)~"da"))
    plot(I~time,data=out,type='l',main=title)
  }
}
par(op)      # restore old settings

Simulation is a useful tool, but its power is limited. The next exercise demonstrates the importance of being able to analyze the equations as well.

Exercise

For each of the above parameter combinations, describe the system’s behavior. Compute \(R_0\) for each parameter combination and relate it to the behavior of the system.

Exercise

Use the ODE solver to study the dependence of the epidemic’s final size on \(R_0\). Compare your results with the predictions of the final size equation \[1-R(\infty)=S(0)\,e^{-R(\infty)\,R_0}=e^{-R(\infty)\,R_0}\] solutions of which are plotted above.

SIR dynamics in an open population

Over a sufficiently short time scale, the assumption that the population is closed is reasonable. To capture the dynamics over the longer term, we’ll need to account for births and deaths, i.e., allow the population to be an open one. As we’ve seen, if we further assume that the birth rate equals the death rate, then the SIR equations become \[\frac{dS}{dt} = \mu -\beta\,S\,I-\mu\,S\] \[\frac{dI}{dt} = \beta\,S\,I-\gamma\,I-\mu\,I\] \[\frac{dR}{dt} = \gamma\,I-\mu\,R\]

We must modify the ODE function accordingly:

open.sir.model <- function (t, x, params) {
  beta <- params["beta"]
  mu <- params["mu"]
  gamma <- params["gamma"]
  dSdt <- mu*(1-x[1])-beta*x[1]*x[2]
  dIdt <- beta*x[1]*x[2]-(mu+gamma)*x[2]
  dRdt <- gamma*x[2]-mu*x[3]
  list(c(dSdt,dIdt,dRdt))
}

We’ll need to specify a birth/death rate in addition to the two parameters we specified before:

params <- c(mu=1/50,beta=400,gamma=365/13)

We integrate the equations as before:

times <- seq(from=0,to=25,by=1/365)
out <- as.data.frame(
                     ode(
                         func=open.sir.model,
                         y=xstart,
                         times=times,
                         parms=params
                         )
                     )

We can plot each of the state variables against time, and \(I\) against \(S\):

op <- par(fig=c(0,1,0,1),mfrow=c(2,2),
          mar=c(3,3,1,1),mgp=c(2,1,0))
plot(S~time,data=out,type='l',log='y')
plot(I~time,data=out,type='l',log='y')
plot(R~time,data=out,type='l',log='y')
plot(I~S,data=out,log='xy',pch='.',cex=0.5)
par(op)                               

Exercise

Explore the dynamics of the system for different values of the \(\beta\) and \(\gamma\) parameters by simulating and plotting trajectories as time series and in phase space (e.g., \(I\) vs. \(S\)). Use the same values of \(\beta\) and \(\gamma\) we looked at above. How does the value of \(R_0\) affect the results?

Exercise

Under the assumptions of this model, the average host lifespan is \(1/\mu\).
Explore how host lifespan affects the dynamics by integrating the differential equations for lifespans of 20 and 200 years.

The compartmental modeling strategy can be put to use in modeling a tremendous range of infections. The following exercises make some first steps in this direction.

Challenge

The SIR model assumes lifelong sterilizing immunity following infection. For many infections, immunity is not permanent. Make a compartment diagram for an SIRS model, in which individuals lose their immunity after some time. Write the corresponding differential equations and modify the above codes to study its dynamics. Compare the SIR and SIRS dynamics for the parameters \(\mu=1/50\), \(\gamma=365/13\), \(\beta=400\) and assuming that, in the SIRS model, immunity lasts for 10 years.

Challenge

Make a diagram, write the equations, and study the dynamics of the SEIR model for the dynamics of an infection with a latent period. Compare the dynamics of SIR and SEIR models for the parameters \(\mu=1/50\), \(\gamma=365/5\), \(\beta=1000\) and assuming that, in the SEIR model, the latent period has duration 8 days.

Nonautonomous equations

SIR with seasonal transmission

The simple SIR model always predicts damped oscillations towards an equilibrium (or pathogen extinction if \(R_0\) is too small). This is at odds with the recurrent outbreaks seen in many real pathogens. Sustained oscillations require some additional drivers in the model. An important driver in childhood infections of humans (e.g., measles) is seasonality in contact rates because of aggregation of children the during school term. We can analyze the consequences of this by assuming sinusoidal forcing on \(\beta\) according to \(\beta(t)=\beta_0\,(1+\beta_1\cos(2\,\pi\,t))\). We can modify the code presented above to solve the equations for a seasonally forced epidemic.

seas.sir.model <- function (t, x, params) {
  beta0 <- params["beta0"]
  beta1 <- params["beta1"]
  mu <- params["mu"]
  gamma <- params["gamma"]
  beta <- beta0*(1+beta1*cos(2*pi*t))
  dSdt <- mu*(1-x[1])-beta*x[1]*x[2]
  dIdt <- beta*x[1]*x[2]-(mu+gamma)*x[2]
  dRdt <- gamma*x[2]-mu*x[3]
  list(c(dSdt,dIdt,dRdt))
}

params <- c(mu=1/50,beta0=400,beta1=0.15,gamma=365/13)
xstart <- c(S=0.07,I=0.00039,R=0.92961)
times <- seq(from=0,to=30,by=7/365)
out <- as.data.frame(
                     ode(
                         func=seas.sir.model,
                         y=xstart,
                         times=times,
                         parms=params
                         )
                     )

op <- par(fig=c(0,1,0,1),mfrow=c(2,2),
          mar=c(3,3,1,1),mgp=c(2,1,0))
plot(S~time,data=out,type='l',log='y')
plot(I~time,data=out,type='l',log='y')
plot(R~time,data=out,type='l',log='y')
plot(I~S,data=out,log='xy',pch='.',cex=0.5)
par(op) 

Exercise

Explore the dynamics of the seasonally forced SIR model for increasing amplitude \(\beta_1\). Be sure to distinguish between transient and asymptotic dynamics.

Fitting deterministic dynamical epidemiological models to data

Now we move on to a much more general but considerably more complicated technique for estimating \(R_0\). The method of least squares gives us a way to quantify the discrepancy between the data and a model’s predictions. We can then search over all possible values of a model’s parameters to find the parameters that minimize this discrepancy.

We’ll illustrate this method using the Niamey data, which we’ll load and plot using the following commands:

# niamey <- read.csv("http://kingaa.github.io/parest/niamey.csv",comment.char="#")
niamey <- read.csv("niamey.csv",comment.char="#")
plot(measles~biweek,data=niamey,type='n')
lines(measles~biweek,data=subset(niamey,community=="A"),col=1)
lines(measles~biweek,data=subset(niamey,community=="B"),col=2)
lines(measles~biweek,data=subset(niamey,community=="C"),col=3)
legend("topleft",col=1:3,lty=1,bty='n',
       legend=paste("community",c("A","B","C")))

Since this is a single outbreak, and the number of births and deaths into the population over the course of the outbreak is small relative to the size of the population, we can treat this outbreak as if it were occurring in a closed population. We saw earlier how we can formulate the SIR equations in such a case and how we can solve the model equations to obtain trajectories. Before, we formulated the model in terms of the fractions, \(S\), \(I\), \(R\) of the host population in each compartment. Here, however, we need to track the numbers, \(X\), \(Y\), \(Z\), of individuals in the S, I, and R compartments, respectively. In terms of \(X\), \(Y\), \(Z\), our frequency-dependent SIR model is \[\frac{dX}{dt} = -\frac{\beta\,X\,Y}{N}\] \[\frac{dY}{dt} = \frac{\beta\,X\,Y}{N}-\gamma\,Y\] \[\frac{dZ}{dt} = \gamma\,Y\] Where \(N=X+Y+Z\) is the total host population size. A function suitable for use with deSolve for the closed SIR epidemic is:

require(deSolve)

closed.sir.model <- function (t, x, params) {
  X <- x[1]
  Y <- x[2]
  Z <- x[3]

  beta <- params["beta"]
  gamma <- params["gamma"]
  pop <- params["popsize"]

  dXdt <- -beta*X*Y/pop
  dYdt <- beta*X*Y/pop-gamma*Y
  dZdt <- gamma*Y

  list(c(dXdt,dYdt,dZdt))
}

Thus far, we have only considered deterministic models. In the next lab, we will begin to think about more realistic models that begin to take into account some aspects of the stochastic nature of real epidemics. For now, under the assumption that the epidemic is deterministic, parameter estimation is a matter of finding the model trajectory that gives the best fit to the data. The first thing we need is a function that computes a trajectory given parameters of the model.

prediction <- function (params, times) {
  xstart <- params[c("X.0","Y.0","Z.0")]
  out <- ode(
             func=closed.sir.model,
             y=xstart,
             times=times,
             parms=params
             )
  out[,3]     # return the number of infectives
}

Now we set up a function that will calculate the sum of the squared differences (or errors) between the data and the model predictions.

sse <- function (params, data) {
  times <- c(0,data$biweek/26)          # convert to years
  pred <- prediction(params,times)
  discrep <- pred[-1]-data$measles
  sum(discrep^2)                        # sum of squared errors
}

To get a sense of what this gives us, let’s explore varying some parameters and computing the SSE for community “A” of the Niamey data set. To begin with, we’ll assume we know that \(\gamma=365/13\) and that the initial numbers of susceptibles, infectives, and recovereds, \(X_0, Y_0, Z_0\) were 10000, 10, and 20000, respectively. We’ll write a little function that will plug a value of \(\beta\) into the parameter vector and compute the SSE.

dat <- subset(niamey,community=="A")
params <- c(X.0=10000,Y.0=10,Z.0=39990,popsize=50000,
            gamma=365/13,beta=NA)
f <- function (beta) {
  params["beta"] <- beta
  sse(params,dat)
}
beta <- seq(from=0,to=1000,by=5)
SSE <- sapply(beta,f)

We take our estimate, \(\hat{\beta}\) to be the value of \(\beta\) that gives the smallest SSE.

beta.hat <- beta[which.min(SSE)]

We can plot SSE vs. \(\beta\):

plot(beta,SSE,type='l')
abline(v=beta.hat,lty=2)
plot.window(c(0,1),c(0,1))
text(0.05,0.9,"A")

What does the SIR model predict at \(\beta=\hat{\beta}\)? We compute the model’s trajectory to find out:

params["beta"] <- beta.hat
plot(measles~biweek,data=dat)
lines(dat$biweek,prediction(params,dat$biweek/26))

Exercise

Use this method to obtain estimates of \(R_0\) for measles from each of the three communities in Niamey. You may again assume that the infectious period is approximately two weeks.

Clearly, this fit leaves much to be desired, but recall that we’ve here assumed that we know the correct values for all parameters but \(\beta\). In particular, we’ve assumed we know the infectious period, and the initial conditions. [NB: the initial value of \(Z\) is entirely irrelevant. Why?] Let’s see what happens when we try to estimate two parameters at once.

xdat <- subset(niamey,community=="A")
params <- c(X.0=NA,Y.0=10,Z.0=1,popsize=50000,
            gamma=365/13,beta=NA)
f <- function (beta, X.0) {
  params["beta"] <- beta
  params["X.0"] <- X.0 
  sse(params,dat)
  }
grid <- expand.grid(beta=seq(from=100,to=300,length=50),
                    X.0=seq(from=4000,to=20000,length=50))
grid$SSE <- with(grid,mapply(f,beta,X.0))

We can visualize this as a surface. A convenient function for this is contourplot from the lattice package:

library(lattice)
contourplot(sqrt(SSE)~beta+X.0,data=grid,cuts=30)

Exercise

Discuss the shape of this surface: what does it tell us about the uncertainty in the model’s parameters?

Challenge

Repeat the estimation using a closed SEIR model. Assume that the infectious period is 5 da and the latent period is 8 da. How and why does your estimate of \(R_0\) differ from that you obtained using the SIR model?

Optimization algorithms

When we have more than two parameters to estimate (as we usually will), we cannot rely on grid searches or simple graphical techniques to find the region of parameter space with the best parameters. We need more systematic ways of searching through the parameter space. Mathematicians have devoted much study to optimization algorithms, and there are many of these. Many of them are implemented in R.

The first place to go is the function optim, which implements several common, well-studied, generally-useful optimization algorithms.

?optim

To use it, we have to specify the function we want to minimize and a starting value for the parameters. Starting from this point, optim’s algorithms will search the parameter space for the value that minimizes the value of our objective function.

We’ll write an objective function to try to estimate \(\beta\), \(X_0\), and \(Y_0\) simultaneously. For the moment, we’ll continue to assume that the recovery rate \(\gamma\) is known.

dat <- subset(niamey,community=="A")
params <- c(X.0=NA,Y.0=NA,Z.0=1,popsize=50000,
            gamma=365/13,beta=NA)
f <- function (par) {
  params[c("X.0","Y.0","beta")] <- par
  sse(params,dat)
}
optim(fn=f,par=c(10000,10,220)) -> fit
fit
## $par
## [1] 9150.769987    1.017659  271.651714
## 
## $value
## [1] 166584.5
## 
## $counts
## function gradient 
##      273       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Exercise

In the foregoing, we’ve estimated parameters by minimizing the sum of squared differences between model-predicted number of cases and the data. What would happen if we tried to minimize the squared error on the log scale, i.e., to minimize \((\log(\mathrm{model})-\log(\mathrm{data}))^2\)? What would happen if we minimized the squared error on the square-root scale, i.e., \((\sqrt{\mathrm{model}}-\sqrt{\mathrm{data}})^2\)? What’s the ``right’’ scale to choose?

Challenge

Try to estimate all four parameters at once. Start your algorithm from several places to check that they all converge to the same place. You may find it useful to restart the optimizer to verify its convergence.

Challenge

The above plot of SSE against \(\beta\) shows a second local minimum of the SSE at a much higher value of \(\beta\). Why is this?

Many other optimization algorithms exist. optim implements several of these (see ?optim). Other functions and packages you might look into include: constrOptim, optimx, nlm, nlminb, nloptr (from the nloptr package), and subplex (from the subplex package).

The likelihood

We have seen that fitting mechanistic models to data is a powerful and general approach to estimating parameters. We saw too that least-squares fitting, is a straightforward way to do this. However, several issues arose. First, there was an element of arbitrariness in the choice of discrepancy measure. Second, although we could fairly easily obtain point estimates of model parameters using least-squares, it was not clear how we could obtain concomitant estimates of parameter uncertainty (e.g., confidence intervals). Finally, we began to see that there are limits to our ability to estimate parameters. In this lab, we’ll explore these issues, and see that likelihood offers an attractive resolution to the first and second of these, but that the third is a fundamental challenge.

Likelihood has many advantages:

  1. fidelity to model
  2. a deep and general theory
  3. a sound theoretical basis for confidence intervals and model selection
  4. statistical efficiency

and some disadvantages:

  1. fidelity to model
  2. fragility (lack of robustness)

General definition

Likelihood is the probability of a given set of data \(D\) having occurred under a particular hypothesis \(H\): \[\mathcal{L}(H,D)=\mathrm{Prob}[D|H]\]

A simple example: suppose \(n\) individuals participate in a serological survey and \(k\) of these individuals are found to be seropositive. One parameter of interest is the true fraction, \(p\), of the population that has seroconverted. Assuming the sample was drawn at random and the population is large, then the probability of the data (\(m\) of \(n\) individuals seropositive) given the hypothesis that the true probability is \(p\) is \[\mathrm{Prob}[D|H] = \binom{n}{k}\,p^k\,(1-p)^{n-k}.\]

If the true seroprevalence was, say, \(p=0.3\), what does the probability of observing \(k\) seropositives in a sample of size \(n=50\) look like?

p <- 0.3
n <- 50
k <- seq(0,50,by=1)
prob <- dbinom(x=k,size=n,prob=p) 
plot(k,prob,type='h',lwd=5,lend=1,
     ylab="probability")

The likelihood is a function of the unknown parameters. In this case, if we assume \(n\) is known, then the likelihood is a function of \(p\) alone: \[\mathcal{L}(p) = \binom{n}{k}\,p^k\,(1-p)^{n-k}\] Typically the logarithm of this function is more interesting than \(\mathcal{L}\) itself. Looking at this function for each of two different surveys:

k1 <- 18
n1 <- 50
p <- seq(0,1,by=0.001)
plot(p,dbinom(x=k1,size=n1,prob=p,log=TRUE),
     ylim=c(-10,-2),ylab="log-likelihood",
     type='l')
abline(h=dbinom(x=k1,size=n1,prob=k1/n1,log=TRUE)-
       0.5*qchisq(p=0.95,df=1),col='red')
abline(v=k1/n1,col='blue')

k2 <- 243
n2 <- 782
p <- seq(0,1,by=0.001)
plot(p,dbinom(x=k2,size=n2,prob=p,log=TRUE),
     ylim=c(-10,-2),ylab="log-likelihood",
     type='l')
abline(h=dbinom(x=k2,size=n2,prob=k2/n2,log=TRUE)-
       0.5*qchisq(p=0.95,df=1),col='red')
abline(v=k2/n2,col='blue')

In the above two plots, the likelihood is a function of the model parameter \(p\). Vertical lines show the maximum likelihood estimate (MLE) of \(p\). Horizontal lines show the critical likelihoods for the likelihood ratio test at the 95% confidence level.

Exercise

How do the two curves just plotted differ from one another? What features of the data are responsible for the differences?

From data points to data sets

Let’s suppose we have three samples, \(D_1, D_2, D_3\), taken by three different researchers, for the same large population. If these samples are independent, then \[\mathrm{Prob}[D|H] = \mathrm{Prob}[D_1|H]\times\mathrm{Prob}[D_2|H]\times\mathrm{Prob}[D_3|H],\] which means that the likelihood of the full data set is the product of the likelihoods from each of the samples. In other words, the likelihood gives a general recipe for combining data from different studies. We’d compute the likelihood as follows:

n <- c(13,484,3200)
k <- c(4,217,1118)
dbinom(x=k,size=n,prob=0.2,log=TRUE)
## [1]   -1.873761  -79.243371 -197.561806
sum(dbinom(x=k,size=n,prob=0.2,log=TRUE))
## [1] -278.6789
ll.fn <- function (p) {
  sum(dbinom(x=k,size=n,prob=p,log=TRUE))
}
p <- seq(0,1,by=0.001)
loglik <- sapply(p,ll.fn)
plot(p,loglik,type='l',ylim=max(loglik)+c(-10,0))

Fitting SIR to an epidemic curve using likelihood

Let’s revisit the model-fitting we did yesterday for the case of measles in Niger. We’ll simplify the model slightly to eliminate some unnecessary and wasteful elements. Our frequency-dependent SIR model, again, is \[\frac{dX}{dt} = -\frac{\beta\,X\,Y}{N}\] \[\frac{dY}{dt} = \frac{\beta\,X\,Y}{N}-\gamma\,Y\] \[\frac{dZ}{dt} = \gamma\,Y\] Notice that 1. the \(Z\) equation is irrelevant for the dynamics of the epidemic and we can drop it entirely, and 1. \(\beta\) only ever occurs in combination with \(N\), so we can combine these two into a single parameter by defining \(b=\beta/N\). We can modify the R codes we used before to take account of this.

require(deSolve)

closed.sir.model <- function (t, x, params) {
  inc <- params["b"]*x[1]*x[2]          # incidence
  list(c(-inc,inc-params["gamma"]*x[2]))
}
prediction <- function (params, times) {
  out <- ode(
             func=closed.sir.model,
             y=params[c("X.0","Y.0")],
             times=c(0,times),
             parms=params
             )
## return the Y variable only
## and discard Y(0)
  out[-1,3]
}

Earlier, we used SSE as a measure of the discrepancy between model predictions and data:

sse <- function (params, data) {
  times <- data$biweek/26               # convert to years
  pred <- prediction(params,times)
  discrep <- pred-data$measles
  sum(discrep^2)                        # sum of squared errors
}

Now let’s use likelihood instead. Let’s suppose that, when we record cases, we make errors that are normal. Here’s how we can compute the likelihood of the data given the model and its parameters:

loglik <- function (params, data) {
  times <- data$biweek/26
  pred <- prediction(params,times)
  sum(dnorm(x=data$measles,mean=pred,sd=params["sigma"],log=TRUE))
}
dat <- subset(niamey,community=="A")
params <- c(X.0=10000,Y.0=10,gamma=365/13,b=NA,sigma=1)

f <- function (b) {
  par <- params
  par["b"] <- b
  loglik(par,dat)
}

b <- seq(from=0,to=0.02,by=0.0001)
ll <- sapply(b,f)

We plot the results:

plot(b,-ll,type='l',ylab=expression(-log(L)))
b.hat <- b[which.max(ll)]
abline(v=b.hat,lty=2)

The great similarity in the likelihood estimate to our first least-squares estimate is no accident. Why is this? Let \(y_t\) be the observed number of infectives at time \(t\) and \(Y_t\) be the model’s prediction. Then the log likelihood is \[\log\mathrm{Prob}[y_t|Y_t] = \log{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\,\exp{\left(-\frac{(y_t-Y_t)^2}{2\sigma^2}\right)}\right)} = -\tfrac{1}{2}\,\log{2\pi\sigma^2}-\tfrac{1}{2}\,\frac{(y_t-Y_t)^2}{\sigma^2}\] and \[\log\mathcal{L} = -\tfrac{1}{2}\,\left(\frac{1}{\sigma^2}\,\sum_{t}\!(y_t-Y_t)^2+\log{(\sigma^2)}+\log{(2\pi)}\right)\] So MLE and least-squares are equivalent if the errors are normal with constant variance!

Exercise

Suppose, alternatively, that the errors are log-normal with constant variance. Under what definition of SSE will least-squares and maximum likelihood give the same parameter estimates?

Modeling the noise

All this raises the question of what the best model for the errors really is. Of course, the answer will certainly depend on the nature of the data. The philosophy of likelihood encourages us to think about the question mechanistically. When the data, \(y_t\), are the result of a sampling process, for example, we can think of them as binomial samples \[y_t\;\sim\;\mathrm{binomial}\left(Y_t,\frac{n}{N}\right)\] where \(n\) is the sample size, \(N\) the population size, and \(Y_t\) is the true number of infections at time \(t\). Alternatively, we might think of \(y_t\) as Poisson samples \[y_t\;\sim\;\mathrm{Poisson}(p\,Y_t)\] where the parameter \(p\) reflects a combination of sampling efficiency and the detectability of infections. The latter leads to the following log-likelihood function

poisson.loglik <- function (params, data) {
  times <- data$biweek/26
  pred <- prediction(params,times)
  sum(dpois(x=data$measles,lambda=params["p"]*pred[-1],log=TRUE))
}

Let’s see what the MLE parameters are for this model. We’ll start by estimating just one parameter. Now, we must have \(b>0\). This is a constraint on the parameter. One way to enforce this constraint is by transforming the parameter so that it cannot ever be negative. We’ll log-transform \(b\).

dat <- subset(niamey,community=="A")
params <- c(X.0=20000,Y.0=1,gamma=365/13,b=NA,p=0.2)

## objective function (-log(L))
f <- function (log.b) {
  params[c("b")] <- exp(log.b) # un-transform 'b'
  -poisson.loglik(params,dat)
}

For something new, we’ll use the mle2 function from the bbmle package to maximize the likelihood. mle2 will employ an iterative algorithm for maximizing the likelihood. To get started, it needs an initial guess. It’s always a good idea to put a bit of thought into the guess. Since \(b=\beta/N=R_0/(\mathrm{IP}\,N)\), where \(\mathrm{IP}\) is the infectious period, and using guesses \(R_0\,\approx\,15\), \(\mathrm{IP}\,\approx\,13\) da, and \(N\,\approx\,20000\), we get \(b\,\approx\,15/(13{\times}20000)\,\mathrm{da}^{-1}\approx{0.02}~\mathrm{yr}^{-1}\).

require(bbmle)
guess <- list(log.b=log(0.01))

fit0 <- mle2(f,start=guess)
fit0
## 
## Call:
## mle2(minuslogl = f, start = guess)
## 
## Coefficients:
##     log.b 
## -5.977186 
## 
## Log-likelihood: -1963.32
fit <-  mle2(f,start=as.list(coef(fit0)))
fit
## 
## Call:
## mle2(minuslogl = f, start = as.list(coef(fit0)))
## 
## Coefficients:
##     log.b 
## -5.977186 
## 
## Log-likelihood: -1963.32

We can get an idea about the uncertainty and in particular obtain confidence intervals using the profile likelihood. To profile over a parameter, we fix the value of that parameter at each of several values, then maximize the likelihood over the remaining unknown parameters. In bblme, this is quite easy to obtain.

prof.b <- profile(fit)
plot(prof.b)

Now let’s try to estimate both \(b\) and the reporting probability \(p\). Since we have constraints on \(p\) (\(0 \le p \le 1\)), we’ll transform it as well. For this, the logit function and its inverse are useful: \[\mathrm{logit}(p)=\log{\frac{p}{1-p}} \qquad \mathrm{expit}(x)=\frac{1}{1+e^{-x}}.\]

dat <- subset(niamey,community=="A")
params <- c(X.0=20000,Y.0=1,gamma=365/13,b=NA,p=NA)

logit <- function (p) log(p/(1-p))      # the logit transform
expit <- function (x) 1/(1+exp(-x))    # inverse logit

f <- function (log.b, logit.p) {
  par <- params
  par[c("b","p")] <- c(exp(log.b),expit(logit.p))
  -poisson.loglik(par,dat)
}

guess <- list(log.b=log(0.005),logit.p=logit(0.2))
fit0 <- mle2(f,start=guess); fit0
## 
## Call:
## mle2(minuslogl = f, start = guess)
## 
## Coefficients:
##      log.b    logit.p 
## -5.9918688 -0.1470357 
## 
## Log-likelihood: -394.2
fit <-  mle2(f,start=as.list(coef(fit0))); fit
## 
## Call:
## mle2(minuslogl = f, start = as.list(coef(fit0)))
## 
## Coefficients:
##      log.b    logit.p 
## -5.9918687 -0.1470357 
## 
## Log-likelihood: -394.2
## now untransform the parameters:
mle <- with(
            as.list(coef(fit)),
            c(
              b=exp(log.b),
              p=expit(logit.p)
              )
            )
mle
##          b          p 
## 0.00249899 0.46330717
prof2 <- profile(fit)
plot(prof2)

We can also get confidence intervals:

ci <- confint(prof2)
ci
##              2.5 %      97.5 %
## log.b   -5.9951952 -5.98854089
## logit.p -0.1953059 -0.09803465
ci[1,] <- exp(ci[1,])
ci[2,] <- expit(ci[2,])
rownames(ci) <- c("b","p")
ci
##         2.5 %     97.5 %
## b 0.002490691 0.00250732
## p 0.451328148 0.47551095

Let’s make a contour plot to visualize the likelihood surface.

dat <- subset(niamey,community=="A")

## this time the objective function has to 
## take a vector argument
f <- function (pars) {
  par <- params
  par[c("b","p")] <- as.numeric(pars)
  poisson.loglik(par,dat)
}

b <- seq(from=0.001,to=0.005,length=50)
p <- seq(0,1,length=50)
grid <- expand.grid(b=b,p=p)
grid$loglik <- apply(grid,1,f)
grid <- subset(grid,is.finite(loglik))
require(lattice)
contourplot(loglik~b+p,data=grid,cuts=20)

The scale over which the log likelihood is varying is clearly huge relative to what is meaningful. Let’s focus in on the region around the MLE.

b <- seq(from=0.00245,to=0.00255,length=50)
p <- seq(0.44,0.49,length=50)
grid <- expand.grid(b=b,p=p)
grid$loglik <- apply(grid,1,f)
grid <- subset(grid,is.finite(loglik))
require(lattice)
contourplot(loglik~b+p,data=grid,cuts=20)

Let’s look at the model’s predictions at the MLE. The model is a probability distribution, so we should look at a number of simulations. An important question is: are the data a plausible sample from the predicted probability distribution?

params[c("b","p")] <- mle
times <- c(dat$biweek/26)
model.pred <- prediction(params,times)

nsim <- 1000
simdat <- replicate(
                    n=nsim,
                    rpois(n=length(model.pred),
                          lambda=params["p"]*model.pred)
                    )
quants <- t(apply(simdat,1,quantile,probs=c(0.025,0.5,0.975)))
matplot(times,quants,col="blue",lty=c(1,2,1),type='l') 
points(measles~times,data=dat,type='b',col='red')

Clearly the model is not doing a very good job of capturing the pattern in the data. It appears that we will need an error model that has the potential for more variability than does the Poisson. Recall that, under the Poisson assumption, the variance of the error is equal to the mean. The negative binomial distribution is such a distribution. Let’s explore the alternative assumption that \(y_t\) is negative-binomially distributed with mean \(p\,Y_t\), as before, but larger variance, \(p\,Y_t\,(1+\theta\,p\,Y_t)\), i.e., \[y_t\;\sim\;\mathrm{negbin}\left(\mathrm{mu}=p\,Y_t,\;\mathrm{size}=\frac{1}{\theta}\right)\]

loglik <- function (params, data) {
  times <- data$biweek/26
  pred <- prediction(params,times)
  sum(dnbinom(x=data$measles,
              mu=params["p"]*pred[-1],size=1/params["theta"],
              log=TRUE))
}

f <- function (log.b, logit.p, log.theta) {
  par <- params
  par[c("b","p","theta")] <- c(exp(log.b),
                               expit(logit.p),
                               exp(log.theta))
  -loglik(par,dat)
}

guess <- list(log.b=log(params["b"]),
              logit.p=logit(params["p"]),
              log.theta=log(1))
fit0 <- mle2(f,start=guess)
fit <-  mle2(f,start=as.list(coef(fit0)))
fit
## 
## Call:
## mle2(minuslogl = f, start = as.list(coef(fit0)))
## 
## Coefficients:
##     log.b   logit.p log.theta 
## -5.933817  1.407372 -0.594814 
## 
## Log-likelihood: -102.46
prof3 <- profile(fit)
plot(prof3)

mle <- with(
            as.list(coef(fit)),
            c(
              b=exp(log.b),
              p=expit(logit.p),
              theta=exp(log.theta)
              )
            )

params[c("b","p","theta")] <- mle
times <- c(dat$biweek/26)
model.pred <- prediction(params,times)

nsim <- 1000
simdat <- replicate(
                    n=nsim,
                    rnbinom(n=length(model.pred),
                            mu=params["p"]*model.pred,
                            size=1/params["theta"])
                    )
quants <- t(apply(simdat,1,quantile,probs=c(0.025,0.5,0.975)))
matplot(times,quants,col="blue",lty=c(1,2,1),type='l') 
lines(times,simdat[,1],col='black')
points(measles~times,data=dat,type='b',col='red')

What does this plot tell us? Essentially, the deterministic SIR model, as we’ve written it, cannot capture the shape of the epidemic. In order to fit the data, the optimization algorithm has expanded the error variance, to the point of absurdity. The typical model realization (in black) does not much resemble the data.

Exercise

Revisit the other communities of Niamey and/or the British boarding school influenza data using the Poisson model and bbmle.

Challenge

Try to estimate \(p\), \(b\), and \(X_0\) simultaneously.

Challenge

Reformulate the problem using the binomial error model. Modify the parameter estimation codes appropriately, estimate the parameters, and comment on the results.

Challenge

Reformulate the problem using a normal error model in which the variance is proportional to the mean: \[y_t\;\sim\;\mathrm{normal}\left(p\,Y_t,\sigma\,\sqrt{Y_t}\right).\] Modify the parameter estimation codes appropriately, estimate the parameters (including both \(p\) and \(\sigma\)), and comment on the results.

Challenge

We’ve been treating the Niamey data as if they were direct—though inaccurate—measurements of the prevalence. Actually, these are incidence data: they are measures of unique infections. It would be more appropriate to model these data by adding another equation \[\frac{dC}{dt} = \frac{\beta\,X\,Y}{N}\] to accumulate new infections and assuming the data are distributed according to, for example, \[y_t\;\sim\;\mathrm{Poisson}\left(p\,(C_t-C_{t-1})\right).\] Modify the codes above to reflect these more appropriate assumptions, estimate the parameters, and comment on the results.

Back to course homepage

R codes for this document

References

Anonymous. 1978. Influenza in a boarding school. British medical journal 1:587.

Grais, R. F., M. J. Ferrari, C. Dubray, O. N. Bjørnstad, B. T. Grenfell, A. Djibo, F. Fermon, and P. J. Guerin. 2006. Estimating transmission intensity for a measles epidemic in Niamey, Niger: Lessons for intervention. Transactions of the Royal Society of Tropical Medicine and Hygiene. 100:867–873.